THE REINSURANCE ACTUARY
  • Blog
  • Project Euler
  • Category Theory
  • Disclaimer

An Actuary learns Machine Learning - Part 12 - Kaggle Tabular Playground Competition - June 22

24/6/2022

 

Picture

​

​In which we start a new Kaggle competition, submit a dummy attempt, and then build a very basic Excel model to establish a baseline for future progress.
​




​Source: https://somewan.design


In the next few post I’ll be writing up my attempts at the June 22 Kaggle Tabular Playground competition. You can read about the competition here:

https://www.kaggle.com/competitions/tabular-playground-series-jun-2022/overview

Attempt 1

​Let’s start by just submitting all 0s as our submission just to establish a baseline for our progression as we increase the sophistication of our model. This is the dummy submission file provided by Kaggle, so there is some logic to using 0 rather than another number. Doing so gives us the following score:

Picture
​A public score of 1.42282, which places us in joint 646 out of 681 on the leaderboard. Interestingly, there’s a few people who have done substantially worse than this. The leaderboard tracks your best attempt, so some people have only submitted models which are worse than the dummy submission provided by Kaggle as part of the initial data download, yikes!
 
Attempt 2

Now let’s build our first proper model. I’m still a fan of prototyping projects in Excel, which is possible a habit I should try to get out of, but it’s still where I’m most comfortable.

For our first model, I’m going to use the median of a given column to impute the missing values. This is probably the most basic method we can use which is still based on some sort of statistical reasoning. Other options along the same lines would be to use the mean or mode of a given column, which we could try later, but for now I'm sticking with the median.
​
Let’s open the data in Excel, and then add in the median for each column, and then for each missing value, lookup the relevant median:
Picture
​Pretty straight forward to set up. Now let’s submit this and see how much of an improvement it gives us:
Picture
​A very marginal improvement, 1.423 -> 1.417, but an improvement none the less. We are now up to 635 out of 681, so quite some way to go.

Conclusion

That’s all for this time, there are a couple of clear next steps; we need to build up some intuition on the dataset to guide us towards selecting appropriate models and approaches. We haven't really done any EDA at all yet. Then we need to get the data into Python to speed up the workflow, so that we can try some more powerful models out.


Your comment will be posted after it is approved.


Leave a Reply.

    Author

    ​​I work as an actuary and underwriter at a global reinsurer in London.

    I mainly write about Maths, Finance, and Technology.
    ​
    If you would like to get in touch, then feel free to send me an email at:

    ​LewisWalshActuary@gmail.com

      Sign up to get updates when new posts are added​

    Subscribe

    RSS Feed

    Categories

    All
    Actuarial Careers/Exams
    Actuarial Modelling
    Bitcoin/Blockchain
    Book Reviews
    Economics
    Finance
    Forecasting
    Insurance
    Law
    Machine Learning
    Maths
    Misc
    Physics/Chemistry
    Poker
    Puzzles/Problems
    Statistics
    VBA

    Archives

    March 2023
    February 2023
    October 2022
    July 2022
    June 2022
    May 2022
    April 2022
    March 2022
    October 2021
    September 2021
    August 2021
    July 2021
    April 2021
    March 2021
    February 2021
    January 2021
    December 2020
    November 2020
    October 2020
    September 2020
    August 2020
    May 2020
    March 2020
    February 2020
    January 2020
    December 2019
    November 2019
    October 2019
    September 2019
    April 2019
    March 2019
    August 2018
    July 2018
    June 2018
    March 2018
    February 2018
    January 2018
    December 2017
    November 2017
    October 2017
    September 2017
    June 2017
    May 2017
    April 2017
    March 2017
    February 2017
    December 2016
    November 2016
    October 2016
    September 2016
    August 2016
    July 2016
    June 2016
    April 2016
    January 2016

  • Blog
  • Project Euler
  • Category Theory
  • Disclaimer