THE REINSURANCE ACTUARY
  • Blog
  • Project Euler
  • Category Theory
  • Disclaimer

Data Science, Machine Learning, Data Mining... What do they mean exactly?

14/9/2016

 


"I don't know what you mean by 'glory,' " Alice said.
Humpty Dumpty smiled contemptuously. "Of course you don't—till I tell you. I meant 'there's a nice knock-down argument for you!' "
"But 'glory' doesn't mean 'a nice knock-down argument'," Alice objected.
"When I use a word," Humpty Dumpty said, in rather a scornful tone, "it means just what I choose it to mean—neither more nor less."
"The question is," said Alice, "whether you can make words mean so many different things."
"The question is," said Humpty Dumpty, "which is to be master—that's all."

I don't think Lewis Carroll had 'Big Data' or 'Machine Learning' in mind when he penned these words, however I think the quote is quite apt in this context. All to often these buzzwords seem to fall foul to the Humpty Dumpty principle, they mean just what the speaker chooses them to mean - regardless of what the words actually mean to anyone else. So what do these terms actually mean?

Machine Learning

The field of study which investigates algorithms that give computers the ability to learn without being explicitly programmed.
 
What do we mean by ‘learn’ in this context? The definition used by Machine Learning practitioners, originally stated by  Arthur Samuel is:

 "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E."

So what problems can Machine Learning algorithms be applied to? The main advances in machine learning have been in the following areas:
  • Classification - classifying data items into groups based on a training set. For example, a computer given a set of emails and told which are spam and which are not spam will be able to use machine learning to classify new emails as either spam or non-spam.
  • Cluster Analysis – identifying similarities in items in data sets without being explicitly told what to look for.
  • Computer vision – teaching computers to understand what they are seeing from visual inputs
  • Object recognition – a combination of the above three area in which a computer is able to correctly recognise objects from visual inputs.
  • Natural Language Processing – being able to correctly interpret natural languages.
  • Search Engines – taking human input to a search engine and suggesting appropriate results
  • Speech and handwriting recognition – translating speech and handwriting into written text.
​
A trait shared by all these problems is that previously computers were thought to be incapable of tackling them. This is one reason why Machine Learning is such an exciting and growing field of study.

If you'd like to know more about Machine Learning then Andrew Ng at Stanford University has released a really good free online course through Coursera which can be accessed through the following link:​
https://www.coursera.org/learn/machine-learning

Big Data

Big Data can be defined as data which conforms to the 3Vs. Big Data is available at a higher volume, higher velocity (rate at which data is generated) and/or greater variety than normal data sources.
 
So for example, looking at an insurance company, claims data would not count as Big Data, the volume will be fairly low, velocity will be slow, and variety will be fairly uniform.
 
The browsing patterns of an aggregator website on the other hand would count as Big Data. For example, the amount of time someone spends on Comparethemarket.com, their clicks, what they search for, how many searches they make, how often they return to the website before making a purchase, etc. would count as Big Data. There would be a massive volume of data to analyse and the data would be available in real time. (It wouldn’t meet the variety criteria, but that’s not a necessary condition)
 
Due to the need to extract useful information from Big Data, and the difficulties created by the 3Vs, we cannot rely on traditional methods of data analysis. Given the volume and velocity of Big Data, we require methods of analysis that does not need to be programmed explicitly, this is where Machine Learning fits in. Machine Learning in the guise of speech and handwriting recognition can also be important if the data generated is in audio form but needs to be combined with other data.

Data Mining

Data Mining is a catch all term for the process of analysing and summarising data into useful information. Data may be in the form of Big Data, and methods used may be based on Machine Learning (where the algorithm learns from the data) or may be more traditional.

Data Visualisation

Data Visualisation is the process of creating visual graphics that aid in understanding and exploring data. It has become increasingly important for two reasons, firstly, the rise in the volume of data sets means that new methods are required to understand data, secondly, an increase in computing power means that more advanced visualisation techniques are now possible.
​
Data Science

Data Science is a broad term which encompasses processes which aim to extract knowledge or insight from Data. Data science therefore includes all the previous fields.
 
For example, in carrying at an analysis, we will first collect our data, which may or may not be in the form of Big Data, we will then mine our data, possibly using machine learning, and then present our results through Data Visualisation.

Your comment will be posted after it is approved.


Leave a Reply.

    Author

    ​​I work as an actuary and underwriter at a global reinsurer in London.

    I mainly write about Maths, Finance, and Technology.
    ​
    If you would like to get in touch, then feel free to send me an email at:

    ​LewisWalshActuary@gmail.com

      Sign up to get updates when new posts are added​

    Subscribe

    RSS Feed

    Categories

    All
    Actuarial Careers/Exams
    Actuarial Modelling
    Bitcoin/Blockchain
    Book Reviews
    Economics
    Finance
    Forecasting
    Insurance
    Law
    Machine Learning
    Maths
    Misc
    Physics/Chemistry
    Poker
    Puzzles/Problems
    Statistics
    VBA

    Archives

    March 2023
    February 2023
    October 2022
    July 2022
    June 2022
    May 2022
    April 2022
    March 2022
    October 2021
    September 2021
    August 2021
    July 2021
    April 2021
    March 2021
    February 2021
    January 2021
    December 2020
    November 2020
    October 2020
    September 2020
    August 2020
    May 2020
    March 2020
    February 2020
    January 2020
    December 2019
    November 2019
    October 2019
    September 2019
    April 2019
    March 2019
    August 2018
    July 2018
    June 2018
    March 2018
    February 2018
    January 2018
    December 2017
    November 2017
    October 2017
    September 2017
    June 2017
    May 2017
    April 2017
    March 2017
    February 2017
    December 2016
    November 2016
    October 2016
    September 2016
    August 2016
    July 2016
    June 2016
    April 2016
    January 2016

  • Blog
  • Project Euler
  • Category Theory
  • Disclaimer